On Knowledge-Based Machine Translation
نویسندگان
چکیده
This paper describes the design of tile knowledge representation medium used for representing concepts and assertions, respectively, in a subworld chosen for a knowledge-based machine u'anslation system. This design is used in the TRANSLATOR machine translation project. The kuowledge representation language, or interlingua, has two components, DIL and TIL. DIL stands for 'dictionary of interlingua' and descibes tile semantics of a subworld. T I L stands for 'text of interlingua' and is responsible for producing an interlingua text, which represents tile meaning of an input text in tile terms of trte interlingua. We maintain that involved analysis of various types of linguistic and eucyclopaedic meaniug is necessary for the task of autx)matic translatiou. The mechanisms for extracting and nlanipnlating and reproducing the nteaning of te~ts will be reported in detail elsewhere. The linguistic (inchlding tile syutactic) knowledge about source altd target languages is used by the nlechanisnls that translate texts into aud from the btterlingua. Since interlingua is an artificial langnage, we can (and do, through TII ,) control tile syntax and semantics of the allowed interlingua elements. The interlingua, snggesled for TRANSI.ATOR has a ln'oader coverage than other knowledge re, presentation schemata for natural language. It involves the knowledge about discourse, speech acts, focus, thne, space and other facets of the overall meaning of texts. to Del imi t ing file P r o b l e m . TRANS/,AfOR explores the knowledge based apln'oach to machine translation. "File basic translation strategy is to extract nleaniug froul tile inlmt text in source language, SL, represent this nmaning hi a language iudependeut senlantic representation and tlmn render this meauh~g in /, tw'get language, TI,. The knowledge representation language used in such a set-up is called, for historical reasons, interlingua (henceforth, ILl. TRANSLATOR'S ul t ima~ ainl is achieving good quality an/el/latin translation in n non-trivial snbworld and its corresponding sublangnage. The philosophy of 'rltANSI.ATOR ailns at tile independence of tile process of trauslafion froln human intervention in tile fcnnl of the traditional preand/or post-editing, hlteraction during tit/,* process of tra~lslation can be accommodated by this philosophy, but only as a temporary measure. Interactive modules will be phlgged into the system pendhlg the develop ment of autonlatic modules for perfbrnling tile various tasks as well as more powerful inference engines and representation schemata. This is a device that facilitates early testing of a system Even tlefbre all the modules are actually built. Another advantage of this strategy is that the systnlu becomes 'dynamic ' , in the sense that its knowledge is growing with use. This strategy is an exteusion of one of the approaches discussed, for example, in Carbonell and Tomita (1985) since it implies knowledge acquisition during the exploitation stage a/~d also involves a broader class of texts as its inlnlt. Johnson and Whitelock (1985) are also proponents of the interactive approach, lint their motivation is different, in that they perceive the human to be an integral part of their system even in its final incarnation. In any case, interactlvity is not tile central design feature of TRANSLATOR. Before proceeding to describe the knowledge chlsters in TItANSI.A.. "fOR we would like to colnnlent very briefly on a number of methodologi cal points concerning MT research. It seems that some of file opinions more or less commonly hekl by some members of the MT con/munity may need rethinking. In what follows we list some of these opinions, together with our comments. A more detailed treatment of these topics will be given elsewhere. lThin paper is based upon work suptx~rted by the National Science Foundation under Grant DCR-8407114. * Colgate University ** Purdue University Opinion. It is nnnecessary to extract tile full meaning from the SL text in order to achieve adequate MT. C o n u n e n t , An MT system can do well withont (involved) semantics in nlany cases, bnt has to USE meaning in tile rest (or rely on hnlnau interyen/ion). Machines, unlike humans , cannot on demand prodnce interpretations of text at all arbitrary depth sufficient for understanding. Therefore, if one aims at fully automatic, one has to prepare tile system for tile treatment of even very semantically involved text. One Call, of course, think of designing a systenl that can decide how deeply each sentence can be analyzed semantically in an atlempt to minimize selnantic analysis. We maintain that tile decision nlaking involved is as complex as the initial problem of deep senlantic analysi:;. Opinion° II is not necessary to finish lnocessing the inlmt sentence before starting the translation. Indeed, people very often do this (consider interpleu~.rs) with very good re, suits. {2nlnment. This Opinion is based on iutrospEcdon. The [eal thought processes that gt, on ill tile trans[atols' or thE interprEtErs' heads are uot known. The (quite considerable) knowledge that the translators i/ave about the subject of the text (speech) and about tilt: speech situation itself prorupts them to preempt the text by following their expectations concerning the most probable set of meanings fbr the tcxl and deciding before tile final eorloboration arrives, biveu if' hi a majority of cases this strategy works (as it is supposed to, because otherwise humans , being intelligent creatures as thi:y are, Would not have had tile above expectations in the first place!), them is nothing unusual in making an c r ie r of" judgenlent. Those ot us who worked as translators surely remeulbct nmltiple instances of this kind. Of course, tills disEussion is relative to the quality of product desired in tile Iranslatioii. Opinion° Apln'oaches to MT based on AI do not pay sufficient attention to the syntactic analysis of SI,, while syntactic information is important for MT. Cllnlnlexll. Syntactic structure of inpnt conveys meaning; this nmanmg is extracted by the semantic analyzer with the help of syntactic knowledge. All clues are indeed used. No resnlts of' syntactic analysis are, storEd because they are not needed. Any approach that attempts to relate directly various syntactic slyucturn trees between SE, lad T]~, strikes us as quite nnpromising. It is only some early Al-otiented MT systems that were vuh/erable to this criticism. Opinion° lL-based approaches Inad to an overkill because no peculia,i. ties of SI, (and of the relationship between, or contrastive knowledge of, SI, and TI,) can be used in translation. Some languages have quite a lot in conunon in their syntax and meauing dislributiou. It is wastefltl not to USE this additional infbrmatiou iu translatiou. C o m m e n t . While snch insights cau sometimcs bE detected and nsed, hies/ of them comes fronl h/uuan intnitinu, and cannot be taken advautage of in an MT systeel, which can hardly he considered a model of human performance. It is also totally wrong to imply, in our opinion, that discovery and implenlentation of those pieCES of contrastive knowledge can be simpler or, in fact, distinct from invoiw?d semantic analysis. Opinion. With l[,, the process of translation beconms one of interpretatiou, The structure of the SL text, whert used in addition to It, in MT, governs tile choice of one of tile paraphrases. Moreovm, again, II , is an overkill, because tile paraphrases are not needed and add an elemeut of ambiguity. C o m m e n t . thnnan translators always have a few practically Equally acceptable paraphrases for virtually every St. sentence. The degree of meaning similarity among the acceptable paraphrases is determined by external parameters. The translation is executed according to the human translator's intuitive understanding of these parameters. Only in II. approaches can one control tile required degree of sinlilarity among the acceptable paraphrases as la'anslatious of all SL sentence. Opiniou. Generation of TL is a relatively simple problem for which very little or no knowledge other than lexical or syntactic is needed.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملThe Kant System: Fast, Accurate, High-Quality Translation In Practical Domains
Knowledge-based interlingual machine translation systems produce semantically accurate translations, but typically require massive knowledge acquisition. Ongoing research and development at the Center for Machine Translation has focussed on reducing this requirement to produce large-scale practical applications of knowledge-based MT. This paper describes KANT, the first system to combine princi...
متن کاملWhat is Example-Based Machine Translation?
We maintain that the essential feature that characterizes a Machine Translation approach and sets it apart from other approaches is the kind of knowledge it uses. From this perspective, we argue that Example-Based Machine Translation is sometimes characterized in terms of inessential features. We show that Example-Based Machine Translation, as long as it is linguistically principled, significan...
متن کامل